Near-Optimal Entrywise Sampling for Data Matrices
نویسندگان
چکیده
We consider the problem of selecting non-zero entries of a matrix A in order to produce a sparse sketch of it, B, that minimizes A B 2. For large m n matrices, such that n m (for example, representing n observations over m attributes) we give sampling distributions that exhibit four important properties. First, they have closed forms computable from minimal information regarding A. Second, they allow sketching of matrices whose non-zeros are presented to the algorithm in arbitrary order as a stream, with O 1 computation per non-zero. Third, the resulting sketch matrices are not only sparse, but their non-zero entries are highly compressible. Lastly, and most importantly, under mild assumptions, our distributions are provably competitive with the optimal offline distribution. Note that the probabilities in the optimal offline distribution may be complex functions of all the entries in the matrix. Therefore, regardless of computational complexity, the optimal distribution might be impossible to compute in the streaming model.
منابع مشابه
Restricted Strong Convexity and Weighted Matrix Completion: Optimal Bounds with Noise
We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries ...
متن کاملSpectral Method and Regularized MLE Are Both Optimal for Top-$K$ Ranking
This paper is concerned with the problem of top-K ranking from pairwise comparisons. Given a collection of n items and a few pairwise binary comparisons across them, one wishes to identify the set of K items that receive the highest ranks. To tackle this problem, we adopt the logistic parametric model—the Bradley-Terry-Luce model, where each item is assigned a latent preference score, and where...
متن کاملNear-optimal Distributions for Data Matrix Sampling
We give near-optimal distributions for the sparsification of large m n matrices, where m ! n, for example representing n observations over m attributes. Our algorithms can be applied when the non-zero entries are only available as a stream, i.e., in arbitrary order, and result in matrices which are not only sparse, but whose values are also highly compressible. In particular, algebraic operatio...
متن کاملIterative Methods for Detecting Semipositive Matrices
A Matrix A ∈ Rn×n is said to be semipositive if there exists positive x ∈ R such that Ax is positive. Semipositivity generalizes several of the notions of positivity of a matrix, including entrywise positive matrices, diagonally dominant matrices with positive diagonal elements, and P-matrices. Here, we illustrate the geometric nature of the semipositivity property, list some basic facts about ...
متن کاملFunctions Preserving Nonnegativity of Matrices
The main goal of this work is to determine which entire functions preserve nonnegativity of matrices of a fixed order n— i.e., to characterize entire functions f with the property that f(A) is entrywise nonnegative for every entrywise nonnegative matrix A of size n×n. Towards this goal, we present a complete characterization of functions preserving nonnegativity of (block) uppertriangular matri...
متن کامل